reinforcement learning algorithm造句
例句與造句
- Risk - sensitive reinforcement learning algorithms with generalized average criterion
風(fēng)險(xiǎn)敏感度激勵(lì)學(xué)習(xí)的廣義平均算法 - A reinforcement learning algorithm based on process reward and prioritized sweeping is presented as interference solving strategy
本文提出了基于過程獎(jiǎng)賞和優(yōu)先掃除的強(qiáng)化學(xué)習(xí)算法作為多機(jī)器人系統(tǒng)的沖突消解策略。 - ( 4 ) a new cooperation model called macm is presentd and based on this model , an improved distributed reinforcement learning algorithm is also proposed
( 4 )提出一種新的多agent協(xié)作模型macm及一種改進(jìn)的分布式強(qiáng)化學(xué)習(xí)算法。 - In the first chapter of this paper , a comprehensive survey on the research of reinforcement learning algorithms , theory and applications is provided . the recent developments and future directions for mobile robot navigation are also discussed
本文的第一章對(duì)增強(qiáng)學(xué)習(xí)理論、算法和應(yīng)用研究的發(fā)展情況進(jìn)行了全面深入的綜述評(píng)論,同時(shí)分析了移動(dòng)機(jī)器人導(dǎo)航控制的研究現(xiàn)狀和發(fā)展趨勢(shì)。 - Reinforcement learning has been applied to single agent environment successfully . due to the theoretical limitation that it assumes that an environment is markovian , traditional reinforcement learning algorithms cannot be applied directly to multi - agent system
由于強(qiáng)化學(xué)習(xí)理論的限制,在多智能體系統(tǒng)中馬爾科夫過程模型不再適用,因此不能把強(qiáng)化學(xué)習(xí)直接用于多智能體的協(xié)作學(xué)習(xí)問題。 - It's difficult to find reinforcement learning algorithm in a sentence. 用reinforcement learning algorithm造句挺難的
- In this paper , introducing joint - action to the traditional reinforcement learning , a new multi - agent reinforcement learning algorithm based on behavior prediction is presented and several methods for predicting other agents " behaviors are discussed
在傳統(tǒng)強(qiáng)化學(xué)習(xí)方式中引入組合動(dòng)作的基礎(chǔ)上,本文提出了一種基于行為預(yù)測(cè)的多智能體強(qiáng)化學(xué)習(xí)方法,研究了對(duì)其他智能體行為進(jìn)行預(yù)測(cè)的幾種可行方法。 - The reinforcement learning algorithm was also introduced , since it has some relations with the colony algorithm and can be need in the problem of scheduling . 4 . some new concepts and scheduling algorithms for batch chemical process were proposed in our studies
由于蟻群算法與人工智能中的強(qiáng)化學(xué)習(xí)算法之間有著某種聯(lián)系,同時(shí)強(qiáng)化學(xué)習(xí)近年來也應(yīng)用于求解調(diào)度問題,因此本文也涉及到了一些強(qiáng)化學(xué)習(xí)的主要算法。 - Reinforcement learning algorithms that use cerebellar model articulation controller ( cmac ) are studied to estimate the optimal value function of markov decision processes ( mdps ) with continuous states and discrete actions . the state discretization for mdps using sarsa - learning algorithms based on cmac networks and direct gradient rules is analyzed . two new coding methods for cmac neural networks are proposed so that the learning efficiency of cmac - based direct gradient learning algorithms can be improved
在求解離散行為空間markov決策過程( mdp )最優(yōu)策略的增強(qiáng)學(xué)習(xí)算法研究方面,研究了小腦模型關(guān)節(jié)控制器( cmac )在mdp行為值函數(shù)逼近中的應(yīng)用,分析了基于cmac的直接梯度算法對(duì)mdp狀態(tài)空間離散化的特點(diǎn),研究了兩種改進(jìn)的cmac編碼結(jié)構(gòu),即:非鄰接重疊編碼和變尺度編碼,以提高直接梯度學(xué)習(xí)算法的收斂速度和泛化性能。 - By means of the proposed reinforcement learning algorithm and modified genetic algorithm , neural network controller whose weights are optimized could generate time series small perturbation signals to convert chaotic oscillations of chaotic systems into desired regular ones . the computer simulations on controlling henon map and logistic chaotic system have demonstrated the capacity of the presented strategy by suppressing lower periodic orbits such as period - 1 and period - 2 . meanwhile , the periodic control methodology is utilized , the higher periods such as period - 4 can also be successfully directed to expected periodic orbits
該控制方法無需了解系統(tǒng)的動(dòng)態(tài)特性和精確的數(shù)學(xué)模型,也不需監(jiān)督學(xué)習(xí)所要求的訓(xùn)練數(shù)據(jù),通過增強(qiáng)學(xué)習(xí)訓(xùn)練方式,采用改進(jìn)遺傳算法優(yōu)化神經(jīng)網(wǎng)絡(luò)權(quán)系數(shù),使之成為混沌控制器,便可產(chǎn)生控制混沌系統(tǒng)的時(shí)間序列小擾動(dòng)信號(hào),仿真實(shí)驗(yàn)結(jié)果表明它不僅能有效鎮(zhèn)定混沌周期1 、 2等低周期軌道,而且在周期控制技術(shù)基礎(chǔ)上,也可成功將高周期混沌軌道(如周期4軌道)變成期望周期行為。 - L3ased on the organization rules of internet data , the distribution laws of hyperlinks and the name rules of url , a algorithm of tvm rebuilding is established , and satisfactory experiment results are obtained by applying this algorithm . furthermore , efforts are made by applying of tvm on browse navigation , web page classification and reinforcement learning algorithm
結(jié)合互聯(lián)網(wǎng)資源的構(gòu)建規(guī)則、鏈接分布規(guī)律和url命名規(guī)則,論文提出了樹藤共生數(shù)據(jù)模型的重建算法,實(shí)驗(yàn)結(jié)果驗(yàn)證了樹藤共生模型的有效性與合理性,在此基礎(chǔ)上初步討論了樹藤共生模型在瀏覽導(dǎo)航、網(wǎng)頁分類和reinforcementlearning算法中的應(yīng)用。